Author: RAMU MEDA

Date: 01/sep/03

 

What is internationalization?

·         Internationalization allows software to be adapted to any language and cultural convention.

·         During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture

·         Abbreviated as i18n, because there are 18 letters between the first "i" and the last "n."

What is localization?

·         Localization is the process of adapting a program for use in a specific locale.

·         Localization includes the translation of text such as GUI labels, error messages, and online help.

·          It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

·         Often abbreviated as l10n, because there are 10 letters between the "l" and the "n."

Types of data that vary with region or language:

  • Messages
  • Labels on GUI components
  • Online help
  • Sounds
  • Colors
  • Graphics
  • Icons
  • Dates
  • Times
  • Numbers
  • Currencies
  • Measurements
  • Phone numbers
  • Honorifics and personal titles
  • Postal addresses
  • Page layouts
  • Legal rules, e.g. tax calculations
  • Encryption techniques
  • Dictionary Sort Order
  • Usually most of the objects you need to isolate in a ResourceBundle are String objects. However, not all String objects are locale-specific. For example, if a String is a protocol element used by interprocess communication, it doesn't need to be localized, because the end users never see it
  • Log file: If a log file is written by one program and read by another, both programs are using the log file as a buffer for communication. Then there is no need of translation.On the other hand, if end users rarely check the log file, the cost of translation may not be worthwhile

Characteristics of internationalized program:

  • With the addition of localization data, the same executable can run worldwide.
  • Support for new languages does not require recompilation.
  • Textual elements, such as status messages and the GUI component labels, are not hard-coded in the program. Instead they are stored outside the source code and retrieved dynamically.
  • Culturally-dependent data, such as dates and currencies, appear in formats that conform to the end user's region and language.
  • It can be localized quickly.
  • Localization is the process of adapting a program for use in a specific locale. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

Locale Object:

§         A Locale object represents a specific geographical, political, or cultural region.

§          An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. For example, displaying a number is a locale-sensitive operation--the number should be formatted according to the customs/conventions of the user's native country, region, or culture.

§         If you intend to create international Java applications, you'll definitely use the java.util.Locale class. There's no getting around it

§         You create a Locale object using one of the two constructors in this class:

o        Locale(String language, String country)

o         Locale(String language, String country, String variant)

§         The country and variant codes are optional. When omitting the country code, you specify a null String.

§         Although the Locale constructor allows lowercase letters, it promptly converts the code to uppercases to create the correct internal representation

§         The first argument to both constructors is a valid ISO Language Code. These codes are the lower-case two-letter codes as defined by ISO-639.

§         The second argument to both constructors is a valid ISO Country Code. These codes are the upper-case two-letter codes as defined by ISO-3166.

§         The second constructor requires a third argument--the Variant. The Variant codes are vendor and browser-specific.

§         Because a Locale object is just an identifier for a region, no validity check is performed when you construct a Locale.

§         If you want to see whether particular resources are available for the Locale you construct, you must query those resources. For example, ask the NumberFormat for the locales it supports using its getAvailableLocales method.

§         Note: When you ask for a resource for a particular locale, you get back the best available match, not necessarily precisely what you asked for.

§          The Locale class provides a number of convenient constants that you can use to create Locale objects for commonly used locales. For example, the following creates a Locale object for the United States:  Locale.US

§         Once you've created a Locale you can query it for information about itself.

o         Use getCountry to get the ISO Country Code and getLanguage to get the ISO Language Code.

o        You can use getDisplayCountry to get the name of the country suitable for displaying to the user. Similarly, you can use getDisplayLanguage to get the name of the language suitable for displaying to the user.

o         Interestingly, the getDisplayXXX methods are themselves locale-sensitive and have two versions: one that uses the default locale and one that uses the locale specified as an argument.

§         The Java 2 platform provides a number of classes that perform locale-sensitive operations.

o        the NumberFormat class formats numbers, currency, or percentages in a locale-sensitive manner.

§         NumberFormat.getInstance()

§         NumberFormat.getCurrencyInstance()

§         NumberFormat.getPercentInstance()

o        These methods have two variants; one with an explicit locale and one without; the latter using the default locale.

§         A Locale is the mechanism for identifying the kind of object (NumberFormat) that you would like to get. The locale is just a mechanism for identifying objects, not a container for the objects themselves.

§         A variant is an optional extension to a Locale. Usually you specify variant codes to identify differences caused by the computing platform

§         The variant codes conform to no standard. They are arbitrary and specific to your application.

§         Locale-sensitive classes support only certain Locale definitions

§         Although the Java compiler and run-time environment won't complain if you make up your own language and country identifiers, you should use the valid codes defined by ISO standards

§         When the Java1 Virtual Machine (JVM) starts up, it queries the underlying OS for a default-locale setting. You can discover your default locale programmatically.

§         In a Java application, each locale-sensitive object is responsible for its own locale-dependent behavior. A Locale object doesn't enforce this behavior; it simply acts as an indicator to other objects. Those objects are then responsible for using the Locale appropriately.

§         By design, locale-sensitive classes are independent of each other. That is, the set of supported Locales in one class does not need to be the same as the set in another class.

§         A Java application can have multiple locales active at the same time. That is, it's possible to use a French date format and a U.S. number format in the same application. Nothing limits you from creating truly multicultural and multilingual Java applications. You can assign a different Locale to every locale-sensitive object in your program. This flexibility allows you to develop multilingual applications, which can display information in multiple languages.

§         Scope of a Locale: On the Java platform you do not specify a global Locale by setting an environment variable before running the application. Instead you either rely on the default Locale or assign a Locale to each locale-sensitive object.

Resource Bundle:

·         Resource bundles contain locale-specific objects. When program needs a locale-specific resource, a String for example, your program can load it from the resource bundle that is appropriate for the current user's locale.

·         A ResourceBundle is an example of a locale-sensitive object.

·         This allows you to write programs that can:

§         be easily localized, or translated, into different languages

§         handle multiple locales at once

§         be easily modified later to support even more locales

·         One resource bundle is, conceptually, a set of related classes that inherit from Resource Bundle. Each related subclass of Resource Bundle has the same base name plus an additional component that identifies its locale.

·         Each related subclass of Resource Bundle contains the same items, but the items have been translated for the locale represented by that ResourceBundle subclass.

·         In general, the objects stored in a ResourceBundle are predefined and ship with the product. These objects are not modified while the program is running

·         When your program needs a locale-specific object, it loads the ResourceBundle class using the getBundle method:

§          ResourceBundle my Resources =     ResourceBundle.getBundle("MyResources", currentLocale);t

§         the first argument specifies the family name of the resource bundle that contains the object in question. The second argument indicates the desired locale.getBundle uses these two arguments to construct the name of the ResourceBundle subclass it should load as follows.

·         The resource bundle lookup searches for classes with various suffixes on the basis of

§         the desired locale and

§         the current default locale as returned by Locale.getDefault(), and

§         the root resource bundle (baseclass),

In the following order from lower-level (more specific) to parent-level (less specific):

baseclass + "_" + language1 + "_" + country1 + "_" + variant1
baseclass + "_" + language1 + "_" + country1 + "_" + variant1 + ".properties"
baseclass + "_" + language1 + "_" + country1
baseclass + "_" + language1 + "_" + country1 + ".properties"
baseclass + "_" + language1
baseclass + "_" + language1 + ".properties"
baseclass + "_" + language2 + "_" + country2 + "_" + variant2
baseclass + "_" + language2 + "_" + country2 + "_" + variant2 + ".properties"
baseclass + "_" + language2 + "_" + country2
baseclass + "_" + language2 + "_" + country2 + ".properties"
baseclass + "_" + language2
baseclass + "_" + language2 + ".properties"
baseclass
baseclass + ".properties"

§         The baseclass must be fully qualified (for example, myPackage.MyResources, not just MyResources). It must also be accessible by your code; it cannot be a class that is private to the package where ResourceBundle.getBundle is called.

§         Resource bundles contain key/value pairs. The keys uniquely identify a locale-specific object in the bundle. Here's an example of a ListResourceBundle that contains two key/value pairs:

 class MyResource extends ListResourceBundle {
      public Object[][] getContents() {
              return contents;
      }
      static final Object[][] contents = {
      // LOCALIZE THIS
              {"OkKey", "OK"},
              {"CancelKey", "Cancel"},
      // END OF MATERIAL TO LOCALIZE
      };
 }
 
  • Keys are always Strings. In this example, the keys are OkKey and CancelKey. In the above example, the values are also Strings--OK and Cancel--but they don't have to be. The values can be any type of object.

§         You retrieve an object from resource bundle using the appropriate getter method.:  button1 = new Button(myResourceBundle.getString("OkKey"));

§         The getter methods all require the key as an argument and return the object if found. If the object is not found, the getter method throws a MissingResourceException.

§         Besides getString; ResourceBundle supports a number of other methods for getting different types of objects such as getStringArray. If you don't have an object that matches one of these methods, you can use getObject and cast the result to the appropriate type.

§         You should always supply a baseclass with no suffixes. This will be the class of "last resort", if a locale is requested that does not exist. In fact, you must provide all of the classes in any given inheritance chain that you provide a resource for. For example, if you provide MyResources_fr_BE, you must provide both MyResources and MyResources_fr or the resource bundle lookup won't work right.

§         The Java 2 platform provides two subclasses of ResourceBundle, ListResourceBundle and PropertyResourceBundle, that provide a fairly simple way to create resources. ListResourceBundle manages its resource as a List of key/value pairs.

§         PropertyResourceBundle uses a properties file to manage its resources.

§         If ListResourceBundle or PropertyResourceBundle do not suit your needs, you can write your own ResourceBundle subclass. Your subclasses must override two methods: handleGetObject and getKeys().

§         The keys must be String objects in ListResourceBundle Object. The keys as well as key values must be string objects in PropertyResourceBundle Object.

§         You can organize your ResourceBundle objects according to the category of objects they contain. For example, you might want to load all of the GUI labels for an order entry window into a ResourceBundle called OrderLabelsBundle..

o        Advantages: Easier to read & maintain; load into memory fast; reduce memory usuage by loading the required bundle.

InputStreamReader

§         An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset.

§         The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.

§          Each invocation of one of an InputStreamReader's read () methods may cause one or more bytes to be read from the underlying byte-input stream.

§         To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.

§         For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:  BufferedReader in   = new BufferedReader(new InputStreamReader(System.in));

OutputStreamWriter

o        An OutputStreamWriter is a bridge from character streams to byte streams:

o         Characters written to it are encoded into bytes using a specified charset. T

o        he charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.

o        Each invocation of a write () method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream. The size of this buffer may be specified, but by default it is large enough for most purposes. Note that the characters passed to the write() methods are not buffered.

o        For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations. For example: 

o        Writer out   = new BufferedWriter(new OutputStreamWriter(System.out));

o        A surrogate pair is a character represented by a sequence of two char values: A high surrogate in the range '\uD800' to '\uDBFF' followed by a low surrogate in the range '\uDC00' to '\uDFFF'. If the character represented by a surrogate pair cannot be encoded by a given charset then a charset-dependent substitution sequence is written to the output stream.

o        A malformed surrogate element is a high surrogate that is not followed by a low surrogate or a low surrogate that is not preceeded by a high surrogate.

o        It is illegal to attempt to write a character stream containing malformed surrogate elements. The behavior of an instance of this class when a malformed surrogate element is written is not specified.

Properties:

·         The Properties class represents a persistent set of properties.

·         The Properties can be saved to a stream or loaded from a stream.

·         Each key and its corresponding value in the property list is a string.

·         Properties file stores information about the characteristics of a program or environment including internationalization/localization information.

·         A properties file is in plain-text format

·         These keys must not change, because they will be referenced when your program fetches the translated text

·         A property list can contain another property list as its "defaults"; this second property list is searched if the property key is not found in the original property list.

·         Because Properties inherits from Hashtable, the put and putAll methods can be applied to a Properties object. Their use is strongly discouraged as they allow the caller to insert entries whose keys or values are not Strings. The setProperty method should be used instead.

·          If the store or save method is called on a "compromised" Properties object that contains a non-String key or value, the call will fail.

·         When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence.

·         The native2ascii tool can be used to convert property files to and from other character encodings.

·         By creating a Properties object and using the load method a program can read a properties file. The program can then access the values by using the key as follows:

o        Properties props = new Properties();

o        props.load(new BufferedInputStream(new FileInputStream("filename");

o        String value = System.getProperty("key");

·         Alternatively properties can be specified on the command line at application startup time, e.g.                    java -Dmy.property=value MyApplication

·         If the key is not found getProperty returns null.

·         PropertyResourceBundle  is backed up by a set of properties files. ListResourceBundle is backed by a class file

Package java. text

·         Provides classes and interfaces for handling text, dates, numbers, and messages in a manner independent of natural languages. This means your main application or applet can be written to be language-independent, and it can rely upon separate, dynamically-linked localized resources. This allows the flexibility of adding localizations for new localizations at any time.

·         All classes in the java. text package are Locale sensitive

·         These classes are capable of

o        formatting dates, numbers, and messages, parsing;

o        searching and sorting strings;

o        Iterating over characters, words, sentences, and line breaks.

·         This package contains three main groups of classes and interfaces:

o        Classes for iteration over text

o        Classes for formatting and parsing

o        Classes for string collation

·         A CollationKey represents a String under the rules of a specific Collator object.

·         The Collator class performs locale-sensitive String comparison

·         An Annotation object is used as a wrapper for a text attribute value if the attribute has annotation characteristics.

·         Use the BreakIterator class only with natural-language text. To tokenize a programming language, use the StreamTokenizer class.

·          

Unicode

·         Unicode is an international effort to provide a single character set that everyone can use.

·         Java uses the Unicode 2.0 (or 2.1) character encoding standard.

·         In the Java programming language char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages

·          In Unicode, every character occupies two bytes. Ranges of character encodings represent different writing systems or other special symbols. For example, Unicode characters in the range 0x0000 through 0x007F represent the basic Latin alphabet, and characters in the range 0xAC00 through 0x9FFF represent the Han characters used in China, Japan, Korea, Taiwan, and Vietnam.

·         UTF is a multibyte encoding format, which stores some characters as one byte and others as two or three bytes. If most of your data is ASCII characters, it is more compact than Unicode, but in the worst case, a UTF string can be 50 percent larger than the corresponding Unicode string. Overall, it is fairly efficient.

·         Despite the advantages of Unicode, there are some drawbacks: Unicode support is limited on many platforms because of the lack of fonts capable of displaying all the Unicode characters.

·         UTF-8 stands for Universal Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.

Annotation

·         An Annotation object is used as a wrapper for a text attribute value if the attribute has annotation characteristics. These characteristics are:

o        The text range that the attribute is applied to is critical to the semantics of the range. That means, the attribute cannot be applied to subranges of the text range that it applies to, and, if two adjacent text ranges have the same value for this attribute, the attribute still cannot be applied to the combined range as a whole with this value.

o        The attribute or its value usually no longer applies if the underlying text is changed.

CollationKey

·         A CollationKey represents a String under the rules of a specific Collator object.

·         Comparing two CollationKeys returns the relative order of the Strings they represent.

·         Using CollationKeys to compare Strings is generally faster than using Collator.compare. Thus, when the Strings must be compared multiple times, for example when sorting a list of Strings. It's more efficient to use CollationKeys.

·         You can not create CollationKeys directly. Rather, generate them by calling Collator.getCollationKey.

·         You can only compare CollationKeys generated from the same Collator object.

·         Generating a CollationKey for a String involves examining the entire String and converting it to series of bits that can be compared bitwise. This allows fast comparisons once the keys are generated.

·         The cost of generating keys is recouped in faster comparisons when Strings need to be compared many times.

·          Collator.compare examines only as many characters as it needs which allows it to be faster when doing single comparisons.

Collator

·         The Collator class performs locale-sensitive String comparison.

·         Use this class to build searching and sorting routines for natural language text.

·         Collator is an abstract base class. Subclasses implement specific collation strategies. You can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale.

·         The Character comparison methods use the Unicode standard to identify character properties.

Character Encoding: A character encoding is a mapping between characters and code values.

Input method:

·         Lets users enter thousands of different characters using keyboards with far fewer keys.

·         the user may have input methods for different languages or input methods that accept various types of input

·         Input method framework: enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods.

Scenario

Solution

You need to find a localized value for a given key, for example, an error message

Use java.util.Properties to load values from a stream(e.g. a java.io.FileInputStream) and then use a singlelookup key to obtain a localized value

You need to format and present numbers and currencies.

Use java.text.NumberFormat.

 

You need to format and present dates and times

Use java.text.DateFormat.

You need to order and handle text data.

Use Collator and CollationKey for ordering and MessageFormat, ResourceBundle, orPropertyResourceBundle to handle text.

You need to read and write files.

Use InputStreamReader for reading and

OutputStreamWriter for writing.

 

You need to create localized JSPs.

Use Locale, contentType, and pageEncoding attributes. You need to create localized servlets. Use Locale and  ServletResponse.setContentType() and ServletResponse.setLocale() methods

You are developing an application that will only execute in a single and very narrow geographic location.

 

There is no need to develop the application

Using Java’s internationalization feature.

 

You are creating an application for a company with offices in several countries and time zones. Where possible, the application needs to adapt its functionality and presentation to local customs and language.

 

Use Java’s internationalization feature to develop this application.

 

Converting byte stream to character stream (or) locale sp encoding to Unicode

InputStreamReader

Converting Character streams to Byte Streams  (or) Unicode to  regional specific encoding

OutputStreamWriter

Locale independent string/character  comaprisions/sort

Use Collator Object

For repeated searching and sorting of strings

Use Collation Key Class

To Isolate localizable elements from the rest of the application.

ResourceBundle Object

contains String objects that need to be translated into various languages

Use PropertyResourceBundle object

format a compound message in a locale-independent manner

construct a pattern that you apply to a MessageFormat object and store this pattern in a ResourceBundle.

 

To detect character, word, sentence and line boundaries

BreakIterator Class

 


java.text.NumberFormat

  • Provides support for parsing/formatting numbers, currency and percentages in a locale-specificmanner using pre-defined patterns
  • NumberFormat.getNumberInstance (LOCALE).format (NUM)
  • NumberFormat.getCurrencyInstance (LOCALE).format (NUM)
  • NumberFormat.getPercentageInstance (LOCALE).format (NUM)

 

java.text.DecimalFormat

  • Provides support for custom parsing/formatting of numbers using format patterns
  • ‘#’ is used to specify digits, ‘,’ for grouping and ‘.’ for decimal points
  • ‘0’ is used to specify digits with leading zeros
  • “123456.789” with pattern of “0000,###.## “ results in “0123,456.79”
  • output symbols can be changed – e.g. ‘.’ can be rendered as any requested character

 

java.text.DateFormat

  • Provides support for parsing/formatting dates and times in a locale-specific manner using predefinedpatterns. Len of output can be controlled – e.g. DEFAULT, SHORT, MEDIUM, LONG, FULL
  • DateFormat.getDateInstance (DateFormat.DEFAULT, LOCALE).format (DATE)
  • DateFormat.getTimeInstance (DateFormat.DEFAULT, LOCALE).format (DATE)
  • df.getDateTimeInstance (DateFormat.DEFAULT, DateFormat.DEFAULT, LOCALE).format (DATE)

 

java.text.SimpleDateFormat

  • Provides support for custom parsing/formatting of dates/times using format patterns
  • E.g. pattern “dd/MM/yy HH:mm:ss” results in “06/03/02 02:06:30”for correct rendering of dates and times, use locale + pattern (pattern on it’s own could leads toinconsistent formatting in other languages)
  • date symbols can be changed (e.g. “Mon” can be changed to “MON”)

java.text.MessageFormat

  • provides support for template based rendering in a locale-specific manner using a pattern string and an array of arguments – similar to placeholders in SQL PreparedStatement

 

java.text.BreakIterator

  • provides support for identifying breaks (by character, word, sentence or line) in text in a localespecific manner
  • getCharacterInstance (), getWordInstance (), getSentenceInstance (), getLineInstance ()
  • BreakIterator.first (), BreakIterator.next (), while (BreakIterator.next () != BreakIterator.DONE)